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Method and Apparatus for performing synchronised audio and 
video presentation 

The invention relates to the synchronised presentation of 
video and audio streams using non-synchronised processing 
means . 



Background 

MPEG-4 is an international standard developed by the Motion 
Picture Experts Group (MPEG) which also developed a number 
of other MPEG-type standards for cottpressing audio and video 
data, for example MPBG-1 and MPEG-2. The encoded/ compressed 
data is treated as object data and both, video and audio 
data, are combined into a single bit stream. Since an MPEG-4 
system is configured to treat data as object data, it is 
easy to re-organise a received bitstream by separating it 
into multiple single packets of data. An MPEG-4 player al- 
lows then the audio and video data to be reproduced on a 
computer or an other device . 

Invention . 

Even though the video encoding associated with the MPEG-type 
standard provides high resolution pictures, its use requires 
one or more powerful, dedicated processors, for example a 
digital signal processor, for encoding or decoding MPEG-type 
standard video data. The processing of an entire MPEG- type 
stream using only one computer consumes nearly all of the 
coitputational resources of the computer's general purpose 
CPU (central processing unit) , thereby rendering the com- 
puter virtually useless for any other purpose. As a conse- 
quence, being able to use for MPEG-type technology a network 
of remote computers or devices and processing a video stream 
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on one computer or device while sending the audio data to be 
processed on a second computer or device is highly desirable 
for MPEG- type data processing since it allows the utilisa- 
tion of any standard computer or device for the video and 
audio processing. 

Thus, it would be desirable to use two processing means or 
computers for rendering or presenting video and audio data, 
wherein video and audio streams need to be synchronised for 
presentation . 

A problem to be solved by the invention is to provide syn- 
chronised presentation of video and audio using separate de- 
vices the operation of which is basically not synchronised 
with each other. This problem is solved by the method dis- 
closed in claim !• An apparatus that utilises this method is 
disclosed in. claim 8 . 

Advantageous additional embodiments of the invention are 
disclosed in the respective dependent claims. 

The inventive features described below are used for synchro- 
nising presentation of audio data with the appropriate video 
data utilising two processing means or computers, A data 
stream comprising video and audio streams is received by 
first processing means ^ the received data stream is sepa- 
rated into video and audio streams and audio stream audio 
data packets are timestamped by the first processing means. 
Then, audio data packets are forwarded to second processing 
means, a local system time of the second processing means is 
determined and transmission time periods of audio data pack- 
ets from the first processing means to the second processing 
means are calculated based on the local system time and the 
timestamp of the audio data packet. Subsequently, synchro- 
nised audio and video rendering/presentation based on the 
transmission time period is performed. Advantageously the 
process of rendering is accompanied by lowpass filtering the 
transmission time periods whereby a mean transmission time 
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is obtained and used for synchronisation of video and audio 
presentation. Also a median filter can be used for lowpass 
filtering the measured transmission time periods in order to 
the measurement result • 

The present invention solves the above-mentioned problems of 
the prior art, and provides a method capable of fast re~ 
sponse at start-up, as well as high stability during proc- 
essing. The median filter is also very insusceptible with 
respect to large measuring errors. 

An MPEG- type stream is separated into video data and audio 
data, wherein the video data is processed on the first de- 
vice PC_A and the audio data is timestamped and forwarded to 
the second device PC_B which compares the received timestamp 
to the local time. The difference is considered to be the 
required transmission time. The internal time clocks of the 
first processing device and the second processing device are 
not synchronised. 

The time reference for synchronisation of video and audio 
stream is obtained by subtracting mean transmission time pe- 
riods from the local time of the second processing device 
PC B. Subsequently, an additional lowpass filtering can be 
performed by a digital filter such as a Butterworth filter 
having a cut-off frequency below that of high frequency mo- 
tion (jitter) which needs to be eliminated. 



Drawings 

Exemplary embodiments of the invention are described with 
reference to the accompanying drawings, which show in: 
Fig. 1 Block diagram illustrating a network of first and 

second processing means configured to perform audio 

and video presentation; 
Fig. 2 Flowchart of the inventive process. 
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Exemplary embodiments 

In the block diagram of Fig. 1 showing a general configura- 
tion of a multimedia computer network according to the in- 
vention, the reference numeral 10 0 denotes an MPEG-4 player 
which sends an MPEG-4 data stream 102 to first processing 
means PC__A 104 which include a video player 108, The re- 
ceived MPEG-type stream comprises a system, video and audio 
stream, which further contain video data packets 116 and au- 
dio data packets 134. 

A stream analysing stage 110 examines the streams since the 
system stream also includes the structure and the configura- 
tion of the video and audio players. The first computer PC A 
104 processes video data obtained from the MPEG-4 video 
stream and displays it using e.g. an attached monitor. The 
timestamping stage 112 checks the local time clock 106 and 
inserts timestamps into audio data packets. A network 118, 
e.g. of type Ethernet (TCP/IP), connects the first process- 
ing means 104 with second processing means 12 0, e.g. a sec- 
ond computer PC_B, which processes the audio data packets 
received from the first computer PC_A, using audio player 
126. The time base 114 of the first computer 104 and the 
time base 132 of the second computer 12 0 are not synchro- 
nised with each other and they have a tendency to drift away 
from each other. The second computer or the network or the 
first computer checks the local time clock 122 and compares 
the received timestamp 124 to the local time of time clock 
122. The second computer or the network or the first com- 
puter calculates the corresponding transmission time peri- 
ods . 

A median filter 128 can be used for lowpass filtering of 
transmission time periods in order to obtain mean transmis- 
sion which is in turn used for synchronisation of audio and 
video rendering. A Butterworth filter 13 0 provides addi- 
tional lowpass filtering in order to improve the final re- 
sult . 
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MPEG-4 player 10 0 sends the MPEG-4 stream of data to the 
first processing means PC_A which processes video data and 
also forwards the actualised and timestamped audio data 
packets to the second computer PC_B through the network. Af- 
ter receiving audio data packets and also its configuration 
from the first computer, the second computer PC_B compares 
the received timestamp to the local time. The difference is 
considered to be the transmission time period. 
The time base of the video processing computer 104 is not 
synchronised with the time base of the audio processing com- 
puter 120. Also, the internal time clocks of the first and 
the second computer are not synchronised and slowly drift 
from each other. Thus, the timestamps received by the second 
computer can be considered as being altered with respect to 
their value because the real transmission time cannot be 
specified exactly. This may have different reasons, for ex- 
ample: traffic on the network line or lines, configuration 
of TCP/IP and Ethernet, thread processing of the operating 
system, the varying amount of data, etc.. In order to syn- 
chronise the presentation of audio data with the appropriate 
video data the time difference between the sending of the 
packets and their receiving is calculated. This difference 
is then filtered with a median filter. 

A median filter is a time-discrete, non-linear filter which 
stores the acquired samples, sorts them and provides the 
middle sample (or the average of the two middle samples in 
case of even number of input values) as an output of its op- 
eration. The median filter used for the invention is very 
flexible with respect to the number of input samples it 
processes- Initially all samples values are set to zero. Af- 
ter having collected a pre-defined first number of samples, 
e.g. 19, the median filter starts outputting the mean trans- 
mission time, whereby the length of the median filter corre- 
sponds to said first number. As an option, upon receiving 
further input samples, the filter length used is increased 
by one per additional input sample received, up to a pre- 
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defined maximum length, e.g. 499, Thereby both, a fast reac- 
tion time at start-up and a stable continuous operation can 
be achieved. Subsequently, an additional lowpass filtering 
can be performed by a digital filter such as a Butterworth 
filter having a cut-off frequency below that of high fre- 
quency motion (jitter) which needs to be eliminated. This 
kind of operation allows synchronising the video and audio 
presentation with respect to time and thus eliminating dis- 
continuities in the time bases of the two computers. 

In the step 200 in the flow chart of Fig. 2, an MPEG-type 
stream comprising video and audio streams is received by the 
first processing means PC_A. In the next step 2 02 the said 
MPEG-type data stream is separated into the video and the 
audio streams, wherein the first processing computer PC A 
containing video player processes the video stream and the 
second processing computer PC_B containing the audio player 
processes the audio stream. Subsequently, in the step 2 04, 
the audio data packets are timestamped by the video process- 
ing computer PC_A and forwarded to the audio processing com- 
puter PC__B configured to receive audio data from the video 
processing computer PC_A. In the next step 206, the local 
system time of the audio processing computer is determined. 
Next, the audio stream transmission time periods from the 
first processing means to the second processing means are 
calculated in step 208. In the last step 210 synchronising 
of audio and video presentation based on the calculated 
transmission time periods takes place. 

Instead of synchronising presentation of audio data with the 
appropriate video data, the presentation of video data with 
the appropriate audio data can be performed. In such case 
video data packets of the video stream are timestamped by 
the first processing means and video data packets are for- 
warded to the second processing means configured to receive 
audio data packets. Time periods are calculated for the 
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transmission of the video data packets from the first 
processing means to the second processing means / based on 
the corresponding local system time and the times tamps of 
the video data packets . 
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Claims 

1. Method for performing audio and video presentation 
including the steps of: 

receiving a data stream including of video and audio 
streams; 

separating said data stream into video and audio 
streams ; 

timestamping audio data packets of said audio stream by 
first processing means and forwarding audio data packets 
to second processing means configured to receive audio 
data packets; 

determining a local system time of said second process- 
ing means; 

calculating time periods for the transmission of audio 
data packets from said first processing means to said 
second processing means / based on said local system time 
and said timestamps of the audio data packet s; 
synchronising audio and video presentation based on said 
calculated transmission time periods. 

2- Method according to claim 1, wherein timestamping of the 
audio data packets by the first processing means is per- 
formed using an internal time clock of the first proc- 
essing means- 

3. Method according to claim 1 or 2, wherein the time ref- 
erence of the audio presentation is obtained by sub- 
tracting the transmission time period from the local 
time of the second processing means. 

4 . Method according to one of claims 1 to 3 , wherein the 
calculation of a transmission time period is based on a 
plurality of audio data packets sent from the first 
processing means to the second processing means. 
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5. Method according to one of claims 1 to 4 wherein, when 
calculating said transmission time periods, the calcu- 
lated transmission time periods are median filtered in 
order to obtain a mean transmission time period, 
wherein, as an option, the length of said median filter- 
ing is changed dynamically^ starting with a pre-defined 
first number of input transmission time period values 
and increasing in conformity with the number of further 
received transmission time period values, up to a pre- 
defined maximum number of input transmission time period 
values . ■ 

6. Method according to claim 5, wherein said mean transmis- 
sion time period is used for synchronisation of audio 
and video presentation, 

7. Method according to claims 5 or. 6, wherein the accumu- 
lated transmission time period values are sorted for 
said filtering. 

8. System for performing audio and video presentation in- 
cluding : 

means for receiving a data stream including video and 
audio streams; 

means for separating said data stream into video and au- 
dio streams; 

means for timestamping audio data packets of said audio 
stream by first processing means and forwarding audio 
data packets to second processing means configured to 
receive audio data packets; 

means for determining a local system time of the second 
processing means; 

means for calculating time periods for the transmission 
of audio data packets from the first processing means to 
the second processing means, based on the local system 
time and said timestamp of the audio data packets; 
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means for synchronising audio and video presentation 
based on said calculated transmission time periods. 

9, System according to claim 8 wherein, when calculating 
said transmission time periods, the calculated transmis- 
sion time periods are median filtered in order to obtain 
a mean transmission time period, wherein, as an option, 
the length of the median filter is changed dynamically, 
starting with a pre-defined first number of input trans- 
mission time period values and increasing in conformity 
with the number of further received transmission time 
period values, up to a pre-defined maximum number of in^ 
put transmission time period values . 

10. Computer-readable storage medium holding code for per- 
forming the steps of : 

receiving a data stream including video and audio 
streams ; 

separating the said data stream into video and audio 
streams ; 

timestamping audio data packets of said audio stream by 
first processing means and forwarding audio data packets 
to second processing means configured to receive audio 
data packets; 

determining a local system time of the second processing 
means ; 

calculating time periods for the transmission of audio 
data packets from the first processing means to the sec- 
ond processing means, based on the local system time and 
said timestamps of the audio data packets ; 

synchronising audio and video presentation based on said 
calculated transmission time periods. 
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A synchronisation of the presentation of video data with au- 
dio data is described, which data each are processed on two 
non-synchronised computers. An MPEG-type stream is separated 
into video data and audio data, wherein the video data is 
processed utilising the first processing means and the audio 
data is timestamped and forwarded to the second processing 
means which compares the received tiraestamp to the local 
time. The transmission time periods of sending audio data 
packets from the first processing means to the second proc- 
essing means are calculated based on the local system time 
and timestamp is inserted into the audio data packets. Sub- 
sequently, synchronised audio and video presentation is per- 
formed . 
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